Mean or Arithmetic Mean\(\bar{x}\), Geometric Mean\(\operatorname{GM}(x)\), Harmonic Mean\(\operatorname{HM}(x)\), Median\(\operatorname{median}(x)\) and Mode\(\operatorname{mode}(x)\) are some measures of central tendency in the sample.
Range\(\operatorname{range}(x)\), Semi-Interquartile Range\(\operatorname{SIR}(x)\), Mean Deviation about x’\(\operatorname{MD}_{(x')}(x)\), Variance\(s_x^2\), Standard Deviation\(s_x\) are some measures of dispersion in the sample.
Covariance\(\operatorname{cov}(x, y)\) is a measure of the joint variability of two random variables \(x\), \(y\).
Correlation is any relationship, causal or spurious, between two random variables \(x\), \(y\). Pearson’s correlation coefficient\(r\) estimates the linear correlation.
Good linear correlation lets try to observe line of best fit.
Regression
Simple Linear Regression
Simple Univariate Linear Regression is a method for estimating the relationship \(y_i=f(x_i)\) of a response variable \(y\) with a predictor variable \(x\), as a line that closely fits the \(y\) vs. \(x\)scatter plot.
\[
y_i = \hat{a} + \hat{b} x_i + e_i
\]
Where \(\hat{a}\) is the intercept, \(\hat{b}\) is the slope, and \(e_i\) is the ith residual error. We aim to minimize \(e_i\) for better fit.
Ordinary Least Squares
Ordinary Least squares method reduces \(e_i\) by minimizing error sum of squares\(\sum{e_i^2}\).
Code
olssmry =function( d, x_map, y_map,x_lab=waiver(), y_lab=waiver(),title=waiver()){ model =lm(formula=y_map~x_map) smry =summary(model, signif.stars=TRUE) smryvec =c(as.numeric(model$coefficients["(Intercept)"]),as.numeric(model$coefficients["x_map"]), smry$r.squared )return(smryvec)}olstab =t(data.frame(SvG =olssmry(d, d$lngdp, d$snt),LvG =olssmry(d, d$lngdp, d$lfx),LvS =olssmry(d, d$snt, d$lfx)))row.names(olstab) =c("*Sanitation vs. ln(GDP)*","*Life Exp. vs. ln(GDP)*","*Life Exp. vs. Sanitation*")kable( olstab,digit =5,col.names=c("$\\hat{a}$","$\\hat{b}$","$R^2$" ))
\(\hat{a}\)
\(\hat{b}\)
\(R^2\)
Sanitation vs. ln(GDP)
-70.79844
16.77006
0.65059
Life Exp. vs. ln(GDP)
30.24203
4.71876
0.59643
Life Exp. vs. Sanitation
53.22795
0.23907
0.66180
Least Absolute Deviation
Least absolute Deviation method reduces \(e_i\) by minimizing the sum of absolute deviations\(\sum{|e_i|}\).
Code
ladsmry =function( d, x_map, y_map,x_lab=waiver(), y_lab=waiver(),title=waiver()){ model =rq(formula=y_map~x_map) smry =summary(model) smryvec =c(as.numeric(model$coefficients[1]),as.numeric(model$coefficients[2]) )return(smryvec)}olstab =t(data.frame(SvG =ladsmry(d, d$lngdp, d$snt),LvG =ladsmry(d, d$lngdp, d$lfx),LvS =ladsmry(d, d$snt, d$lfx)))row.names(olstab) =c("*Sanitation vs. ln(GDP)*","*Life Exp. vs. ln(GDP)*","*Life Exp. vs. Sanitation*")kable( olstab,digit =5,col.names=c("$\\hat{a}$","$\\hat{b}$" ))
\(\hat{a}\)
\(\hat{b}\)
Sanitation vs. ln(GDP)
-71.23153
16.80472
Life Exp. vs. ln(GDP)
31.99047
4.61340
Life Exp. vs. Sanitation
53.73041
0.23963
Line Fitting
Plotting the estimated Linear Models on the Scatter Plots.